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A  CANONICAL  CORRELATIONS  APPROACH  TO 
MULTISCALE  STOCHASTIC  REALIZATION 

William  W,  Irving  Alan  S.  Willsky 


Abstract 

We  develop  a  realization  theory  for  a  class  of  inultiscale  stochastic  processes  having  white- 
noise  driven,  scale-recursive  dynamics  that  are  indexed  by  the  nodes  of  a  tree.  Given  the 
correlation  structure  of  a  1-D  or  2-D  random  process,  our  methods  provide  a  systematic  way  to 
realize  the  given  correlation  as  the  finest  scale  of  a  multiscale  process.  Motivated  by  Akaike’s  use 
of  canonical  correlation  analysis  to  develop  both  exact  and  reduced-order  model  for  time-series, 
we  too  harness  this  tool  to  develop  multiscale  models.  We  apply  our  realization  scheme  to  build 
reduced-order  multiscale  models  for  two  applications,  namely  linear  least-squares  estimation  and 
generation  of  random-field  sample  paths.  For  the  numerical  examples  considered,  least-squares 
estimates  are  obtained  having  nearly  optimal  mean-square  errors,  even  with  multiscale  models 
of  low  order.  Although  both  field  estimates  and  field  sample  paths  exhibit  a  visually  distracting 
blockiness,  this  blockiness  is  not  an  important  issue  in  many  applications.  For  such  applications, 
our  approach  to  multiscale  stochastic  realization  holds  promise  as  a  valuable,  general  tool. 


‘The  work  of  William  Irving  and  .Alan  Willsky  was  supported  by  the  Air  Force  Office  of  Scientific  Research,  under 
grant  AFOSR-F496-20-93-1-0604,  the  .Army  Research  Office,  under  grant  .ARO  DAAL03-92-G-0015. 
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1  Introduction 


A  class  of  stochastic  processes  indexed  by  the  nodes  of  a  tree  was  introduced  in  [4].  These  processes 
have  white-noise  driven,  scale-recursive  dynamics,  directly  analogous  to  the  time-recursive  dynam¬ 
ics  of  Gauss-Markov  time-series  models.  Experimental  and  theoretical  results  have  demonstrated 
that  this  class  of  processes  is  cpiite  rich  statistically;  the  self-similarity  of  fractional  Brownian  mo¬ 
tion  can  be  represented  [4],  as  can  any  given  1-D  wide-sense  (WS)  reciprocal  process  or  2-D  Markov 
random  field  (WSMRF)  [12].^  Complementing  this  statistical  richness  are  the  efficient  image  pro¬ 
cessing  algorithms  to  which  multiscale  models  lead.  For  example,  a  scale-recursive  algorithm  has 
been  developed  that  directly  generalizes  the  Kalman  filter  and  related  Rauch- Tung-Striebel  (RTS) 
smoother  [4].  This  algorithm  incorporates  noisy  measurements  of  a  given  multiscale  process  to 
calculate  both  smoothed  estimates  and  associated  error  covariances.  Another  algorithm  has  been 
developed  for  likelihood  calculation  [11].  In  contrast  to  traditional  2-D  optimal  estimation  for¬ 
mulations  based  on  Markov  random  fields,  which  have  a  per-pixel  computational  complexity  that 
typically  grows  with  image  size,  these  multiscale  algorithms  have  a  per-pixel  complexity  indepen¬ 
dent  of  image  size.  Substantial  computational  savings  can  thus  result,  as  evidenced  by  the  work 
in  [13]  on  calculating  optical  flow  and  the  work  in  [7]  on  interpolation  of  sea  level  variations  in  the 
North  Pacific  Ocean  from  satellite  measurements. 

Just  as  Kalman  filtering  requires  the  prior  specification  of  a  state-space  model,  so  does  multiscale 
statistical  processing.  In  this  paper,  we  develop  a  general  approach  for  building  multiscale  models. 
Given  the  correlation  structure  of  a  1-D  or  2-D  random  process’  our  methods  provide  a  systematic 
way  to  realize  the  given  correlation  as  the  finest  scale  of  a  multiscale  process.  Because  there  is 
typically  a  conflict  between  model  complexity  and  accuracy,  we  mainly  focus  on  the  case  where 
a  constraint  is  imposed  on  the  allowed  model  state  dimension;  the  objective  then  is  to  build  a 
model  whose  finest-scale  correlation  structure  best  matches  the  desired  correlation,  subject  to  the 

^The  definition  and  properties  of  wide-sense  reciproc.al  processes  and  WSMRFs  are  nicely  summarized  in  [5]. 

■^The  terminology  1-D,  2-D  or  multidimensional  random  process  is  used  here  to  indicate  that  the  dimension  of  the 

independent  variable  of  the  process  is  1-D,  2-D.  or  multidimensional. 
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dimension  constraint.  In  general,  our  focus  on  realizing  finest-scale  statistics  is  motivated  by  the 
not  insignificant  class  of  applications  in  which  the  finest-scale  is  really  the  only  one  of  interest.  For 
instance,  in  many  de-noising  applications,  the  finest-scale  process  is  a  pixel-by-pi.xel  representation 
of  the  image,  the  measurements  are  noisy  observations  of  each  pixel,  and  the  objective  is  to  estimate 
the  value  of  each  image  pixel.  For  such  proldems.  the  multiscale  fraihework  provides  an  efficient 
statistical  approach  for  obtaining  estimates  and  error  covariances,  even  though  every  other  aspect 
of  these  problems  involves  only  the  finest  scale. 

There  is  a  close  relationship  between  the  multiscale  stochastic  realization  problem  and  its  more 
traditional,  time-series  counterpart.  This  relationship  can  be  made  clear  once  the  Markov  property 
of  multiscale  processes  is  noted.  To  describe  the  Markov  property  of  multiscale  processes,  we 
first  observe  that  in  a  cy-th  order  tree  each  node  has  q  children  and  a  single  parent,  and  hence 
partitions  the  remaining  nodes  into  (c/  +  1)  subtrees,  one  associated  with  each  of  these  child  and 
parent  nodes.  (A  second-order  tree,  which  is  often  used  to  index  multiscale  representations  of  time 
series,  is  illustrated  in  Figure  1.)  Now,  the  Markov  property  states  that  if  x{s)  is  the  value  of  the 
state  at  node  s,  then  conditioned  on  x{s}  the  states  in  the  corresponding  (g  + 1)  subtrees  of  nodes 
extending  away  from  s  are  uncorrelated.  The  connection  to  the  time-series  realization  problem  is 
that  in  both  contexts,  the  role  of  state  information  is  to  provide  an  information  interface  among 
subsets  of  the  process.  This  interface  must  store  just  enough  process  information  to  make  the 
corresponding  process  subsets  conditionally  uncorrelated.  In  the  time-series  case,  this  interface  is 
between  two  subsets  of  the  process  (i.e.,  the  past  and  the  future),  while  in  the  multiscale  case,  the 
interface  is  among  multiple  (i.e..  (g  -I- 1))  subsets  of  the  process. 

We  exploit  the  connection  between  the  time-series  and  multiscale  realization  problems  by  adapt¬ 
ing  to  the  multiscale  context  work  done  in  [1]  and  [2]  on  reduced-order  time-series  modeling.  The 
work  in  [1]  and  [2]  addresses  two  issues.  First,  for  exact  realizations,  a  method  is  devised  for  finding 
the  minimal  dimension  and  corresponding  information  content  of  the  state.  Second,  for  reduced- 
order,  approximate  realizations,  a  method  is  devised  for  measuring  the  relative  importance  of  the 
components  of  the  information  interface  provided  by  the  state,  so  that  a  decision  can  be  made 
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about  which  components  to  discard  in  a  reduced-order  realization.  The  latter  is  accomplished 
using  a  classical  tool  from  multivariate  statistics,  nameh-  canonical  correlation  analysis  [8].  We 
decompose  our  multiscale  problem  of  decorrelating  jointly  (f/  +  1)  proce.ss  snlxsets  into  a  collection 
of  q  problems  of  decorrelating  pairs  of  process  sub.sets.  We  then  demonstrate  that  with  respect  to 
a  particular  decorrelation  metric,  canonical  correlation  analysis  can  in  principle  be  used  to  solve 
optimally  each  of  the  pairwise  decorrelation  problems.  Furthermore,  these  pairwise  solutions  can  be 
concatenated  to  yield  a  sub-optimal  solution  to  the  multi-way  decorrelation  problem.  The  solution 
to  this  decorrelation  problem  leads  readily  to  values  for  all  the  multiscale  model  parameters. 

We  apply  our  realization  scheme  to  build  reduced-order  multiscale  models  for  two  applications, 
namely  linear  least-squares  estimation  and  generation  of  random-field  sample  paths.  For  the  nu¬ 
merical  examples  considered,  we  obtain  least-squares  estimates  having  mean-square  errors  that  are 
nearly  optimal,  even  with  multiscale  models  of  very  low  order.  Although  both  field  estimates  and 
field  sample  paths  exhibit  a  visually  distracting  blockiness,  this  blockiness  is  not  really  an  issue  in 
many  applications.  For  such  applications,  our  approach  to  multiscale  realization  holds  promise  as 
a  valuable,  general  tool. 

This  paper  is  organized  in  the  following  way.  In  Section  2,  the  multiscale  framework  is  more 
formally  introduced,  and  a  measure  of  decorrelation  is  defined.  In  Section  3,  the  modeling  problem 
is  precisely  formulated,  and  a  solution  is  overviewed  for  the  case  that  full-order,  exact  models  are 
sought.  In  Section  4,  the  solution  to  the  modeling  problem  is  presented  for  the  more  challenging 
case  that  reduced-order,  approximate  models  are  sought.  In  Section  5,  two  numerical  examples  are 
presented,  and  finally  in  Section  6,  a  .summary  is  provided,  together  with  suggestions  for  future 
work.  Details  of  the  proofs  are  relegated  to  appendices  at  the  end  of  the  paper. 


4 


Root  node 


Figure  1:  The  first  four  levels  of  a  2-nd  order  tree  are  shown.  The  parent  of  node  s  is  denoted  by  sj  and  the  two 
offspring  are  denoted  by  sa\  and  sa-i-  The  random  vectors  ifs  and  fs'  contain,  respectively,  the  finest-scale  state 
information  that  does  and  does  not  descend  from  the  node  .s. 

2  Preliminaries 

2.1  State-Space  Models  on  ^-th  Order  Trees 

The  models  introduced  in  [4,  13]  describe  multiscale  stochastic  processes  indexed  by  nodes  on  a  tree. 
A  g^^-order  tree  is  a  pyramidal  structure  of  nodes  connected  such  that  each  node  has  q  offspring 
nodes.  We  associate  with  each  node  s  a  vector- valued  state  .c(s),  where  in  general,  the  state 
vectors  at  the  m-th  level  of  the  tree  (for  m  =  0.  1,. . . )  can  be  interpreted  as  information  about 
the  m-th  scale  of  the  process.  In  keeping  with  the  conventions  established  in  [4,  13],  we  define  an 
upward  (fine-to-coarse)  shift  operator  7  such  that  57  is  the  parent  of  node  s,  and  a  corresponding 

set  of  downward  (coarse-to-fine)  shift  operators  a;,  i  =  1,2 . q,  such  that  the  q  offspring  of  node 

s  are  given  by  .scvi,  sao,  ■  ■  ■ ,  sa,,.  Figure  1  depicts  an  example  of  the  relative  locations  of  s,  sy,  and 
SQ:i,sa2  in  a  second  order  tree. 

The  dynamics  implicitly  providing  a  statistical  characterization  of  x(s)  have  the  form  of  an 
autoregression  in  scale: 

x(s)  =  A{s)x{s"/)  +  w(s).  (1) 


0 


This  regression  is  initialized  at  the  root  node,  =  0.  with  a  state  variable  .rfO)  having  mean  zero 
and  covariance  P(0).  The  term  iri'.i)  represents  white  driving  noise,  vincorrelated  across  scale  and 
space,  and  also  uncorrelated  with  the  initial  condition  .r(0);  this  noise  is  assumed  to  have  mean 
zero  and  covariance  Qis).  Since  .rfO)  and  u'fs)  are  zero-mean,  it  follows  that  xfs)  is  a  zero-mean 
random  process^.  Furthermore,  since  the  driving  noise  w(s)  is  white,  the  correlation  structure  of 
the  process  x(s)  is  characterized  completely  by  P(0)  and  the  autoregression  parameters  .-1(5)  and 
Q(s)  for  all  nodes  s  0. 

The  statistical  structure  of  multiscale  processes  can  be  exploited  to  yield  an  extremely  efficient 
algorithm  for  estimating  ;r(-),  based  upon  noisy  observations  y(-).  These  observations  take  the  form 

,(/(.s)  =  C(.'i).v(.v)  -r  r(s), 

where  the  noise  v(s)  is  white,  has  covariance  R(3),  and  is  uncorrelated  with  x(-)  at  all  nodes 
on  the  tree.  Just  like  the  Kalman  filter  and  the  RTS  smoother,  this  estimation  algorithm  has 
a  recursive  structure,  and  yields  both  state  estimates  and  associated  error  covariances.  For  a 
multiscale  process  having  states  of  dimension  A  and  indexed  on  a  tree  with  N  nodes,  the  number 
of  required  computations  is  (P(.VA^).  Thus,  the  algorithm  is  quite  efficient,  particularly  when  the 
dimension  of  the  state  vectors  is  low. 

2.2  Markov  Property  of  Multiscale  Processes 

Multiscale  processes  possess  an  important  Markov  property,  stemming  directly  from  the  whiteness 
of  w(s).  We  here  describe  a  special  form  of  this  property,  closely  related  to  our  main  focus  in  this 
paper,  namely  the  finest-scale  of  multiscale  processes.  To  proceed,  we  associate  with  each  tree  node 
s  a  set  Ps,  where  Ps  contains  all  of  the  finest-scale  nodes  that  descend  from  s.  We  also  associate 
with  each  node  s  the  random  vectors  and  The  random  vector  contains  the  elements  of  the 
set  cr  e  Ps},  stacked  into  a  vector,  while  contains  the  elements  of  the  complementary 

set  {x(o'):  cr  €  JPo}n{.x(<T);  a  G  Pj}''.  .stacked  into  a  vector.  It  will  sometimes  prove  convenient 

^The  mean  of  a;(-)  can  be  set  to  any  arbitrary  value,  by  suitably  adjusting  the  mean  of  xfO)  and  w(-). 
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to  denote  by  've  freely  use  both  forms.  To  relate  ^.5  and  .r((7),  we  introduce  the  matrix 

where  Hs\„x{(7)  is  the  linear  least-squares  estimate  of  given  x{a).  These  conventions  are 


illustrated  in  Figure  1. 

The  Markov  property,  as  it  relates  explicitly  to  the  finest  scale,  can  now  be  stated  as  follow 


(.  \  (  u  \  ^ 


C 

1 

S.sa,|s 

II 

0 

== 

TT 

^  ia  jjs 

; 

x{s)  -b 

^SQ!2b 

L  j 

i  ^sa,+i|s  / 

(2) 


with 

■i’(-s)-  . S..a,,+i|s  uiicorrelated. 


(3) 


We  use  this  property  to  rehxte  the  dimension  of  x{s)  to  the  correlation  among  the  vectors  ^sqi, 
<^sQ2>  •  •  •  >  ^5Q,+i-  Towards  this  end.  (2)  and  (3)  together  imply  that 


j  ^saj 


^sai\sPx{s}^saj\s  ^  j)- 


(4) 


(Here  and  elsewhere,  we  adhere  to  the  notational  convention  that  is  the  covariance  of  random 
vector  x  and  P^y  is  the  cross-covariance  of  random  vectors  x  and  y).  By  elementary  linear  algebra 
[18],  the  rank  of  the  cross-covariance  in  (4)  is  upper-bounded  by  the  rank  of  P^i^s)^  which  in  turn  is 
upper-bounded  by  the  dimension  of  x{s).  The  following  proposition  thus  follows: 


Proposition  1 

dim.ev.siOJi{x{s))  >  xna,x  rank  I  Pc ]  . 

i:£j  \  *  / 

If  the  finest-scale  covariance  Pc^  must  match  exactly  some  prespecified  covariance,  then  this 
proposition  provides  a  lower  bound  on  the  required  multiscale  state  dimension.  In  the  rather  likely 
case  that  the  involved  cross-covariance  matrices  have  full  rank,  this  dimension  constraint  becomes 
quite  stringent.  Thus,  to  keep  the  multiscale  estimation  algorithm  computationally  efficient,  we 
find  considerable  motivation  to  turn  to  reduced-order  (approximate)  realizations. 


2.3  The  Generalized  Correlation  Coefficient 


For  the  purposes  of  developing  reduced-order  models,  it  will  prove  convenient  to  have  a  scalar 
measure  of  the  correlation  among  a  collection  of  random  vectors.  We  thus  introduce  a  .so-called 
generalized  correlation  coefficient.  In  keeping  with  .standard  conventions,  we  define  as  follows  the 
correlation  coefficient  pigi-rjo)  between  two  scalar  valued  random  variables  gi  and  gn'- 


pim-.m) 


^’1 1  v-y 


if  P,,,  >0,  for  z  =  1,  2, 


0  '  otherwise. 

Then,  for  a  pair  of  vector- valued  random  variables  ?/i  and  772,  we  define  their  generalized  correlation 
coefficient  p{rji,g2)  by 


pinx.in)  =  max{p(/f77i./.r7/2)} 

where  the  dummy  argument  /;  (for  /  =  1.  2)  is  a  column  vector  having  the  same  dimension  as  pi. 
Finally,  we  extend  the  definition  of  p(-,')  to  a  collection  of  random  vectors  771,772,- -•  iti  tbe 
following  way: 


p{77i,  772....  ,  77fc)  =  max/9(77i,77j). 

Each  of  these  correlation  coefficients  has  a  conditioned  version.  To  define  them,  we  first  let 
random  vector  z  contain  the  conditioning  information.  Also,  we  let  fn  =  pi—  E{pi\z),  where  (here 
and  elsewhere)  we  adhere  to  the  convention  that  E(.ti7/)  is  the  linear  least-squares  estimate  of  x 
given  y.  Finally,  we  define 

P(77i.  772,...  ,  77fc  i  z)  =  p(r/i,  772,...  ,%).  (6) 


3  Formulation  and  Initial  Investigation  of  Realization  Problem 

The  realization  problem  of  interest  in  this  paper  is  to  build  a  multiscale  model,  indexed  on  a 
given  tree  structure,  to  realize  some  pre-specified,  finest-scale  covariance.  We  begin  with  a  random 
vector  xo,  having  the  pre-specified  covariance  and  having  an  established  correspondence  with 
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the  finest  scale  of  the  given  tree.  For  example.  \u  might  be  a  random  field  (written  for  simplicity  ;is 
a  vector),  with  the  finest  scale  of  rhe  rree  (e.g..  a  quadtree)  being  a  pixel-by-pixel  representation  of 
the  field.  Our  objective  is  to  specify  values  for  the  model  parameters  P(0).  .-i(-)  and  Q(-).  to  achieve 
the  best  match  possible  between  the  actual,  realized  covariance  Pcg  and  the  desired  covariance  P^g. 
Because  the  desirable  model  properties  of  low  dimension  and  high  fidelity  are  typically  in  conflict, 
we  impose  a  dimension  constraint:  for  all  .s.  the  state  vector  x{s)  is  constrained  to  have  dimension 
no  greater  than  A^: 


dimension  (.r(.b-))  >  A.,.  (7) 

3.1  Full-Order,  Exact  Realizations 

When  the  dimension  constraint  is  discarded,  the  realization  problem  becomes  conceptually  simpler 
and  exact  realizations  (i.e.,  realizations  for  which  P^g  =  P.^g)  become  possible.  We  begin  by 
analyzing  this  case. 

A  notable  class  of  multiscale  processes  in  this  context  is  those  in  which  each  state  variable  x{s) 
is  a  linear  function  of  the  finest-scale  process; 

x'(s)  =  (8) 

A  state  vector  a;(s)  obeying  this  relationship  can  clearly  be  seen  to  represent  an  aggregate  (coarse) 
description  of  the  finest-scale  process  descending  from  s.  We  refer  to  the  matrix  Wg  associated 
with  node  s  as  the  node’s  internal  matrix,  and  we  refer  to  multiscale  processes  for  which  (8)  holds 
Vs  as  internal  realizations.  The  notion  of  internal  stochastic  realizations  is  standard  in  time-series 
analysis  [10,  16],  with  our  use  of  the  concept  representing  a  natural  adaptation. 

Our  interest  in  internal  realizations  stems  from  the  convenient  fact  that  the  model  parameters 
P(0),  A(-),  and  Q{-)  can  be  specified  completely  in  terms  of  the  internal  matrices  and  the  finest- 
scale  covariance.  In  other  words,  once  values  values  for  the  internal  matrices  have  been  determined, 
values  for  the  model  parameters  P(0),  .4(-)  and  Q{-)  follow  easily.  To  see  this  fact,  we  begin  by 


9 


substituting  (8)  evaluated  at  =  0  into  P(())  =  E  [./■{O)x'^(O)]  to  yield 

=  (9) 

The  parameters  A(s)  and  Q(s)  can  then  l^e  computed  by  noting  that  (1)  represents  the  linear 
least-squares  prediction  of  x(*)  based  upon  .r{sv).  plus  the  associated  prediction  error: 

x(d)  =  E  [xlsj  i  x(.s7)]  +  w{.i)  (10) 

Comparing  (1)  and  (10),  and  using  standard  results  from  linear  least-squares  estimation,  the  model 
parameters  can  be  seen  to  satisfy  the  following  relations: 

=  Px{s)x{S-/}Pj.(sy) 

<?(*')  =  Px{s)  ~  ^x(s)x(sj}Pj;(^sy)Px{s)x{s^)-  (Ifb) 

Finally,  again  using  (8),  the  covariances  appearing  in  (11)  can  be  expressed  as  simple  functions  of 
the  internal  matrices  and  the  finest-scale  covariance: 

PxisMsy)  =  W,P^.iJV]:  (12a) 

Pxis)  =  W,P^^Wj.  (12b) 

Clearly,  the  key  to  constructing  an  exact,  internal  realization  of  a  given  finest-scale  covariance 
is  to  devise  the  internal  matrices.  At  the  finest-scale  nodes,  these  matrices  are  implicitly  defined 
by  the  association  between  finest-scale  nodes  and  the  components  of  xo;  for  example,  if  each  scalar 
component  of  xo  is  assigned  to  a  distinct  finest-scale  node,  then  clearly  Ws  =  1  for  each  finest-scale 
node.  At  the  coarse-scale  nodes  (i.e.,  all  the  nodes  above  the  finest  scale),  the  Markov  property  of 
multiscale  processes  becomes  key.  In  particular,  a  necessary  condition  for  (2)  and  (3)  to  hold  at  a 
coarse-scale  node  s  in  an  exact,  internal  model  is  that  Wg  fulfill  the  following  decorrelating  role: 

P  ( Xso  1  •  XsO  ) . Xa<v,,  +  i  I  11  s  Vs)  (T^) 

In  essence,  the  rows  of  IFs  must  contain  suitable  linear  combinations  of  the  random  vector  Xs,  such 
that  conditioned  on  the  random  vectors  Xsrti>-  -  •  \sa,+i  all  uncorrelated.  Conversely, 
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suppose  that  for  a  desired  covariance  a  matrix  U',  satisfying  (13)  is  found  for  each  coarse-scaie 
node.'*  Suppose,  further,  that  the  resulting  IT'.,  matrices  are  substituted  into  (9),  (11)  and  (12)  to 
calculate  values  of  P(0).  .-4(-)  and  Q(-).  Then  the  resulting  inultiscale  model  will  have  the  desired 
finest-scale  covariance  (i.e..  Pc^  =  P,.)- 

In  summary,  there  is  a  three-stage  procedure  for  realizing  exactly  any  desired  finest-scale  co- 
variance:  (i)  establish  a  correspondence  between  finest-scale  nodes  and  components  of  the  vector 
Xo,  thereby  implicitly  specifying  11,  for  each  finest-scale  node,  (ii)  find  a  matrix  Wg  satisfying 
(13)  for  every  coarse-scaie  node,  and  finally  (hi)  substitute  the  resulting  Wg  values  into  (9),  (11) 
and  (12)  to  calculate  values  for  P(0).  .4(-)  and  Q(-).  A  very  attractive  feature  of  this  procedure 
is  that  it  decomposes  the  realization  problem  into  a  collection  of  independent  sub-problems,  each 
myopically  focused  on  determining  the  information  content  of  the  state  vector  at  a  single  node  to 
fulfill  the  decorrelating  role  (13).  We  hasten  to  add,  however,  that  the  resulting  state  vectors  will 
typically  have  an  impractically  high  dimension,  and  thus  this  construction  is  mainly  of  interest  for 
motivating  our  approach  to  reduced-order  modeling. 

3.2  Reduced- Order,  Approximate  Realizations 

For  the  rest  of  the  paper,  we  reinstate  the  constraint  (7)  on  multiscale  model  state  dimension. 
With  this  constraint  in  effect.  Proposition  1  shows  that  exact  equality  between  and  will 
in  general  be  impossible  to  achieve.  Therefore,  we  no  longer  look  for  Wg  matrices  that  fulfill  the 
decorrelation  condition  (13)  exactly;  instead,  we  look  for  matrices  that  do  the  best  decorrelation 
job  possible,  subject  to  the  dimension  constraint. 

To  describe  the  Wg  condition  we  use  in  lieu  of  (13),  we  must  first  introduce  some  notation. 
We  define  the  random  vectors  \,  and  Vs<^  to  have  the  same  relation  to  .yo  as  ^g  and  .^5=  have  to 
To  be  more  precise,  suppose  that  the  /-th  component  of  ^g  (^5*=)  maps  to  the  7rs(z)-th  (nsc(z)-th) 

component  of  $o;  then,  the  Pth  component  of  yy,  (,\sc)  maps  to  the  ns(f )-th  (nsc(i)-th)  component 
■‘The  choice  Ws  =  I,  so  that  j(a')  =  .J,  is  universally  valid,  though  of  virtually  no  practical  value,  owing  to  the 
high  dimension  for  x{s)  to  which  it  leads. 
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of  yq.  It  will  sometimes  prove  convenient  to  (lenot('  Iw  Vsa,., :  we  freely  use  both  forms. 

^Now.  in  lieu  of  (13).  we  .seek  IT,  matrices  SHti.sfyine; 

W,  =  are;  min  p  (  V-.o,  •  V.a,,- •  ■  •  :  \.sq,+i  J  •  (14) 

where  is  the  set  of  matrices  having  A,  or  fewer  rows  (and  a  number  of  columns  implicitly  defined 
by  conte.xt).  We  refer  to  (14)  as  the  decorrelation  problem.  Once  values  for  the  W,  matrices  have 
been  found,  we  mimic  our  approach  to  the  full-order  realization  problem:  values  for  the  multiscale 
parameters  P{0),  A(-)  and  Q{-)  are  .set  using  analogues  to  (9),  (11),  and  (12)  in  which  is 
replaced  by  and  Pxi,^,y  is  replaced  by  Pcki^xs^-  Tims,  our  reduced-order  modeling  procedure  is 
very  similar  to  our  three-stage,  full-order  modeling  procedure  (see  Section  3.1),  with  the  principal 
exception  being  that  now  condition  (14)  is  used  in  lieu  of  condition  (13). 

There  are  several  comments  to  make  about  this  modeling  approach.  First,  the  approach  shares 
with  its  full-order  counterpart  the  computational  benefit  that  we  can  find  all  the  model  parameters 
in  a  single  sweep  from  coarse  to  fine  scales,  determining  for  each  node  as  we  go  along,  and 
thereby  implicitly  specifying  P(0).  .4(-)  and  Q{-).  We  emphasize,  though,  that  the  condition  (14) 
is  a  heuristic  one.  Certainly,  this  condition  is  reasonable,  from  a  myopic,  node-by-node  view  of  the 
realization  problem;  however,  the  condition  does  not  provide  tight  control  over  the  overall  match 
between  P^^  and  P-^g.  Indeed,  an  open  research  challenge  is  to  find  a  way  to  build  a  reduced-order 
model,  in  which  the  parameters  P(0),  .4(-)  and  Q{-)  are  chosen  explicitly  to  minimize  some  global 
measure  of  the  discrepancy  between  P^g  and  P^g.  This  problem  appears  to  be  very  challenging. 
We  will  focus  only  on  the  more  myopic  problem  of  solving  (14). 

As  an  additional  comment,  models  constructed  with  our  approach  will  not  in  general  be  internal 
realizations.  In  other  words,  (8)  will  not  hold  in  general.  Consequently,  in  reduced-order  models, 
the  PFi  matrices  should  be  interpreted  as  merely  auxiliary  constructs,  which  aid  in  setting  values 
for  the  parameters  P(0),  A(-)  and  Q{-). 

Finally,  the  definition  of  the  generalized  correlation  coefficient  makes  it  clear  that  for  any  given 
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matrix  H ", , 

PiXsai^Xsa-i . p( \sn\ •  \sa2^  •  •  •  ^XsOq^y  |  ^^sXs) 

where  VV"'  is  a  matrix  whose  rows  form  an  orrhouorraal  basis  for  the  row  space  of  Wg.  Hence, 
defining  the  set  jVx  to  be  the  sub.set  of  Mx  liaving  orthonormal  rows,  we  see  that 

min  piXsai  1  X50:-2  ’  *  ‘  n^V*)  =  ,min  p(\sq;,  Xsa2  1  )  XsQtq-i-i  I  Wx.). 

WeMxg  vV  6/VX5 

Thus,  without  loss  of  optimality,  we  can  replace  the  constraint  set  Mx,  in  (14)  with  the  set  Mx,- 
When  convenient,  we  will  freely  make  this  replacement. 


4  Decorrelating  Sets  of  Random  Vectors 

4.1  Decorrelating  a  Pair  of  Random  Vectors 

We  here  analyze  a  special  case  of  the  decorrelation  problems  in  which  there  are  only  two  vectors  to 
decorrelate.  Denoting  these  vectors  by  r?i  and  i]-2  and  stacking  them  as  77  =  (rji'  rj^)^  our  objective 
is  to  find  the  optimal  matrix  solution  to  the  following  optimization  problem: 

H"  =  arg  min  p{in.p2\Wr}) .  (15) 

W^M\ 

Playing  a  central  role  in  the  solution  is  a  standard  result  from  canonical  correlation  theory.  For 
the  purposes  of  stating  this  result  precisely,  we  denote  the  rank  of  the  n;  x  n,  covariance  matrix 
Prf-  by  mi  (for  i  =  1,  2),  and  the  rank  of  P,,;,,,  by  11112-  Also,  we  let  /„  be  an  identity  matrix  having 
n  rows  and  columns. 


Theorem  1  There  exist  m.atrices  Ti  and  Pj.  of  dimension  mi  x  ni  and  m2  'xn-i,  respectively,  such 
that 
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The  matrix  D  has  dimension  m  i  x  ni  i  and  is  given  by  D  =  diay  (^D,  0  j  ,  where  D  =  diag(di ,  ^2,  ■■■,dmi2 ) 
1  >  >  <h  >  •••  >  dmy,  >  0;  for  a  given  [Pqi-  Pq-z  -  P^im)-  '^o.tT'i^  D  is  unique.  Finally.  TF  is 

the  Moore-Penrose  ■pseudoinverse  of  T,,  and  is  given  by  =  Pr^Tf ,  (i  =  1,  2). 


We  refer  to  the  triple  of  matrices  (Ti,T2,D)  as  the  canonical  correlation  matrices  associated 
with  (7?i,7?2)-  For  convenience,  we  introduce  truncated  versions  of  Ti  (for  i  =  1,2),  denoted  by  Ti^k 
and  defined  to  contain  the  first  k  rows  of  T;;  as  a  special  case,  we  define  T)  to  contain  the  first  mi2 
rows  of  Ti.  Results  very  similar  to  Theorem  1  can  be  found  in  several  places,  including  [3,  14,  15]  and 
[6].  A  proof  of  the  theorem,  as  exactly  stated  here,  can  be  found  in  [9].  As  these  proofs  reveal,  the 
calculation  of  the  canonical  correlation  matrices  can  be  carried  out  in  a  numerically  sound  fashion 
using  the  singular  value  decomposition;  this  calculation  requires  0{N^)  floating  point  operations, 
where  N  =  max(ni,n2). 

Theorem  1  can  be  used  to  perform  a  change  of  basis  on  the  vectors  pi  and  772,  to  simplify 
maximally  the  correlation  between  them,  and  thus  to  simplify  analysis  of  the  decorrelation  problem. 
We  define  the  random  vectors  p.  pi  and  po  via 

IJ-  =  (^  pj  p"^  ^  =  T^gu  (i  =  l,2), 

where  thanks  to  Theorem  1,  pi  and  po  have  covariance 

Pn 

\  ^  "  VI  /  \  -‘m2  / 

and  the  transformation  from  (771. 770)  to  (pi.p-i)  is  invertible  in  a  mean-square  sense, 

,^[(77,;  -  TrPi)i>ii  -  Ti^pif]  =  0,  (i  =  l,2). 

The  following  lemma  now  provides  the  key  simplification. 
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Lemma  1 


pi'll- '12  I  ILTf/j  =  pipi-pi  I  U'/i)  (16a) 

P^'li-'h  I  ■=  pi  in.  11-,  i  WT'^fj.)  (16b) 

^^iniu  pi  III- in  I  IT//)  =  ^luia  /If/ii./i-i  |  V'/x)  (16c) 

As  a  special  case  of  (16a)  and  (16b),  we  note  that  p{T]i.r]2)  =  pim,  p-i).  The  lemma  is  a  direct 

consequence  of  the  definition  of  the  generalized  correlation  coefficient,  together  with  Theorem  1; 
we  omit  the  details  of  the  proof. 

Equipped  with  the  foregoing  theorem  and  lemma,  we  can  now  solve  (15). 

Proposition  2  For  0  <  A  <  /?!ij  and  for  i  =  1,  2. 

min  p(7?i,  772  I  IE/;)  =  min  p{in-'l2  \  =  Pim^m  IT^xVi)  =  dx+i-  (17a) 

W^M\ 

For  A  >  mi2, 

min  p(77i,7?2|1E7/)=  min  p(7?i,  7?2  |  l'Ei7?i)  =  p(77i,  %  |  Tit/i)  =  0.  (17b) 

weM\ 


Proof:  In  Appendix  1,  we  demonstrate  that  for  A  <  77112, 


min  p(/xi,/i2  I  lEju)  =  min  p(/ii.;x2  1  lEiMi)  =  p(mi>M2  I  1;^  0  Pi)  =  dx+\,  (18a) 

w&Mx  v't'ie.Vx  V  / 

while  for  A  >  mi2, 

min  p(/ii,/i2  I  lEp)  =  min  pl/zi./to  !  lEiPi)  =  p(pi.P2  I  (  0  ]  Pi)  =  0-  (18b) 

vVi6Aa  \  / 

Once  these  facts  are  established,  the  results  (17a)  and  (17b)  then  follow.  In  particular,  with  regard 
to  (17a),  we  have  the  following  sequence  of  identities: 

min  p(7?i,r?2  I  IE;/)  =  inin  p(/;i. /x;  1  lE/x)  =  p(/xi,/X2|  (  0  )  Pi) 

Vv6vWa  c.V.x  y  / 

=  Pi'h-'hl  (  Ix  0  ^Tipi)  =  dx+i-  (19) 
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The  first  equality  follows  from  (IGo.  the  second  from  the  third  from  (16a)  and  the  fourth 

from  (18a).  The  result  (17b)  can  be  proved  from  (Isb)  in  a  very  similar  fashion;  the  details  are 
omitted.  QED. 

There  are  two  important  points  to  note  about  this  proposition.  The  first  is  that  solving  (1-5) 
is  essentially  a  problem  of  calcidating  the  canonical  correlation  matrices  associated  with  (771,  772); 
indeed,  (15)  can  be  solved  simultaneously  for  all  values  of  A  by  calculating  just  once  these  canon¬ 
ical  correlation  matrices.  The  second  point  is  that  there  is  no  harm  in  having  the  decorrelating 
information  1^77  be  a  linear  function  of  either  771  or  772  alone. 

4.2  Decorrelating  Multiple  Random  Vectors 

We  now  turn  to  the  general  decorrelation  problem  (14).  for  which  we  develop  a  suboptimal  solution. 
This  solution  has  an  intuitively  appealing  structure  motivated  by  the  solution  to  the  simpler  problem 
(15).  We  emphasize  that  to  the  l^est  of  our  knowledge,  the  task  of  characterizing  the  optimal  solution 
to  (14)  is  an  unsolved  problem. 

Our  approach  is  to  decompose  the  decorrelation  problem  into  a  collection  of  q  sub-problems.  In 
the  7-th  sub-problem,  we  focus  on  decorrelating  x,^a.  from  Xsaj  for  all  j  ^  i:  specifically,  we  exploit 
Proposition  2  to  solve 

Wi.it,  =  arg  min  /5  {.Ysa,,  X(sai)- 1  ,  (20) 

v  V 

where  for  now  we  treat  fci, . . .  ,  A;,,  as  free  parameters.  By  choosing  Wi^ki  as  in  (20),  we  effectively 
decorrelate  Xsai  from  Xsaj  (fot  all  j  7^  /)  all  at  once;  in  particular,  it  is  clear  that 

P  {Xsai,  Xsaj  I  \.-a,  )  <  P  (\sa,  •  i  ,  J  #  b  (21) 

and  so,  if  the  right  side  of  (21)  is  small,  then  the  left  side  will  also  be  for  all  j  i. 

To  see  how  we  combine  ll'i,;,-! . to  solve  (14)  approximately,  let  us  consider  the  quantity 

P  (Xsfvi . \snq  I  Rl.A'i  ■  -i  Xsq, )  i  (22) 
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which  we  can  express  more  succinctly  ;is  fA . X^a,,!  . kfi)Xs)  by  defining  the  block- 

diagonal  matrix  IV^iki . Av,)  =  diagdl'i  c-, . Siiice  the  i-th  block  component  of  this 

matrix  has  been  specially  designetl  to  decorrelate  from  j  7^  i.  we  intuitively  expect  that 
all  the  block  components  will  work  together  to  make  (22)  small.  Furthermore,  if 

•I 

Y^k,  <  A„  (23). 

1=1 

then  Ws{ki, ...  ,kq)  ^  -^a,:  implying  that  lV'3(A:i, . . .  ,  k^)  is  in  the  feasible  set  of  the  optimization 
problem  (14),  and  can  indeed  be  used  as  an  approximate  solution  to  (14). 

To  characterize  precisely  the  behavior  of  lF.s(A-i, ....  Ay^),  we  first  must  establish  a  result  describ¬ 
ing  the  non-increasing  nature  of  the  generalized  correlation  coefficient  as  the  amount  of  conditioning 
information  increases: 

Proposition  3 

piVi-m\Wiri,)  <  pivuvo),  i  =  l,2.  (24) 

Proof:  In  Appendix  B,  we  demonstrate  that 

P{Pl-P2\^ViPl)  <  p{p\:P2)-  (25) 

Once  this  fact  is  established,  the  I'esult  (24)  follows.  In  particular,  we  have  the  following  sequence 
of  relations: 

p{r]l,P2\WiPi)  =  p  (/xi./<2  I  <  piPuP-i)  =  pivi^m)- 

The  first  relation  follows  from  (16b).  the  second  from  (25)  and  the  third  from  (16a).  QED. 

We  emphasize  that  if  the  conditioning  information  is  not  a  function  of  either  rji  or  772  alone, 
then  the  function  p(-,-  |  ■)  may  become  an  increasing  one.  For  instance,  if 


( p  p  \ 

^  \  0.5 '' 

\  ^ni'i-2  ^'n  / 

^0.5  1  j 

then  p(7?i,7?2)  =  0.5,  but  p{t]i,  >]2  \  Pi  +  >h)  =  1- 
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We  can,  however,  slightly  strengthen  Proposition  3  by  relaxing  our  restriction  that  all  of  the 
conditioning  information  be  a  linear  function  of  either  rji  or  ij-r,  in  lieu  of  this  restriction,  we  restrict 
each  individual  scalar  cow-ponenf  of  this  conditioning  information  to  be  a  function  of  either  rji  or 
q2-  We  state  this  result  as  a  corollary: 

Corollary  1 

p{vi,V2  \Wiqi^,W2nh)  <  ;?2  I  Wi%i) .  (21,22)  €  {{1,2}  x  {1,2}} 


Proof: 

p{m,V2  \Wiqi^,W2Vi^)  =  pirn  -  E{in\Wit]i^)  .ri2  -  EipilWiPi^)  \  W2  -  E  {qi^\Wiqi^))) 

<  p(;/i  -  E  (//ilWit/i,)  .//2  -  E  {in\Wipi^)) 

=  p(/h- '?2  I 

The  first  and  third  lines  here  represent  direct  applications  of  (6),  while  the  second  line  represents 
application  of  Proposition  3.  QED. 

Using  this  corollary,  we  now  return  to  consideration  of  the  behavior  of  Ws{ki, . . .  ,  kq). 
Proposition  4 

/2(Xsa:i  >  •  •  •  ’  I  ^^siki . kq)\s)  <  jnax^  p(ysQ,; ,  jc  |  Ws(fci,...  ,kq)\s')  (26a) 

E:  rnax  p{XsaiiX(saiY  i  ^5  ,kiXsai)-  (26b) 

1  =  1,...  ,q 

Proof:  The  first  inequality  in  (26)  is  a  direct  consequence  of  the  definition  of  the  generalized 
correlation  coefficient.  The  second  is  then  a  direct  consequence  of  the  corollary  to  Proposition  3. 
QED. 

The  important  point  to  note  about  this  proposition  is  that  Ws{ki,...  ,  kq)  leads  to  a  value 
for  the  objective  function  in  (14)  that  is  no  greater  than  the  maximum  of  the  values  obtained  in 
the  q  sub-problems  (20).  In  other  words,  by  concatenating  together  the  solutions  to  the  q  sub¬ 
problems  (20)  into  the  block-diagonal  matrix  Ws(ki,...  ,  kq),  we  obtain  an  approximate  solution 
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to  (14)  having  a  value  upper-bounded  by  rhe  maximum  value  of  these  q  solutions  to  (20).  This 
observation  suggests  a  way  to  select  value.s  for  the  parameters  /ji,.  ..  ,  A:,.  In  particular,  subject  to 
the  constraint  (23).  ive  should  choose  these  parameters  to  fulfill  the  following  miniiucix  condition: 


(A*,....Avp  =  arg  min  max  p(\,„..  |  IV'.*,  V,„,)  i  .  (27) 

^•1 . ^  1'==' . 'I  J 

By  choosing  the  ki  parameters  in  this  fashion,  we  minimize  the  right  side  of  (26b),  which  upper- 
bounds  the  left  side  of  (26a).  The  matrix  B*s(/.:J',. . .  ,  k*)  then  can  serve  as  a  suboptimal  solution 
to  (14). 


To  describe  the  solution  to  (27),  we  denote  by  (Tsq;  ,  i  )  the  canonical  correlation  matri¬ 
ces  associated  with  (Xsa,.,  X(sai)‘=)-  where  the  diagonal  elements  of  Dsj  are  denoted  by  . . . . 

For  simplicity  of  exposition  only,  we  assume  that  Ay  is  strictly  less  than  the  rank  of  the  cross¬ 
covariance  ■Pxsa.X(sa  )<=’  ^  =  1-  ■  •  •  <■/-  Then,  thanks  to  Proposition  2,  it  follows  that 


Pi\sai  -  X{saiy  I  ) 


Hence,  the  minimcix  definition  (27)  is  ecjuivalent  to  the  following,  where  we  again  impose  the 
constraint  (23): 

(AT...  ,A*)  =  arg  min  <  max  d1'\ 

This  discrete  optimization  problem  can  easily  be  solved,  once  the  canonical  correlation  quantities 
fsai,Ds,ij  associated  with  (xsa.r  \(soi)=)  calculated,  for  f  =  1, 2, . . .  ,q  [9]. 

4.3  Calculating  the  Canonical  Correlation  Matrices 

For  problems  of  practical  interest  to  us.  the  dimension  of  \'s  and  can  be  on  the  order  of  a 
thousand  (or  greater),  thus  prohibiting  the  exact  calculation  of  the  associated  canonical  correlation 
matrices  {Ts,Ds)-  However,  if  the  correlation  bet^veen  Xs  and  Xsc  has  a  certain  special  structure, 
then  we  can  achieve  a  substantial  reduction  in  the  complexity  of  the  computation.  In  essence,  we 
assume  that  the  correlated  component  of  \s  and  Xs"'  fives  in  some  low-dimensional  subspace  that 
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is  easily  defined:  we  then  do  all  of  our  coinpurarions  with  low-dimensional  random  vectors  that  live 
in  this  subspace,  and  thereby  achieve  our  complexity  reduction. 

To  l)e  more  precise,  we  introduce  the  random  vectors  /Cv  and  as 


/i^  —  and  O^cy^c. 


(28) 


Here.  and  ©s^  are  matrices  havin'^  having  full  row  rank  and  such  that 


E 


(\.  -  £'(\.l/i))  {.\v 


EixMf 


0, 


(29) 


for  both  /i  =  Us  and  ji  =  It  is  not  difficult  to  see  that  if  (T(^,22  ,  T)^)  are  the  canonical 
correlation  matrices  for  {fig,  fJ's<=)-  then 


fs  =  f;''©,,  and  Ds  =  D^.  (30) 

This  result  leads  to  considerable  computational  savings  when  xo  is  a  wide-sense  Markov  ran¬ 
dom  process  or  field.  Even  if  \o  represents  a  WSMRF  as  large  as  256  x  256,  then  the  canonical 
correlation  matrices  associated  with  (xs,  Xs'=)  can  be  computed  in  a  manageable  fashion  to  machine 
precision.  Moreover,  for  non-Markov  processes  and  fields,  a  slight  generalization  of  this  approach 
serves  effectively  as  a  method  for  obtaining  good  approximate  results. 

We  illustrate  the  approach  by  considering  a  2-D  example.  In  the  example,  we  let  yo  represent 
the  values  of  a  first-order,  scalar- valued  WSMRF  over  a  discrete  lattice  having  dimensions  256  x 
256.  We  focus  on  a  particular  node  s  for  which  and  contain  the  values  of  the  field  at  the 
subsets  of  points  displayed  in  Figure  2a.  Specificalh^  ys  contains  the  values  of  the  field  at  the  64 
grid  points  marked  with  circles,  both  filled  and  not  filled,  in  the  white  region,  while  y^c  contains 
the  values  at  the  all  other  grid  points;  subsets  of  these  other  grid  points  are  marked  with  squares, 
both  filled  and  not  filled. 

Thanks  to  the  Markov  property,  we  can  devise  by  inspection  matrices  ©s  and  ©jc  to  fulfill  (29). 
In  particular,  we  can  let  ©„.  and  ©^c  be  selection  matrices  chosen  such  that  fig  and  iJ-gc  contain  the 
values  of  y^  and  Xs‘=  at  their  respective  boundary  points,  where  these  boundary  points  are  marked 
with  filled-in  circles  and  squares,  respectively.  To  see  the  computational  savings  that  can  result  by 
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Figure  2:  Illustration  of  our  approach  to  finding  the  canonical  correlation  matrices  associated  with  (Xs-X»'0  for  (a) 
a  first-order  WSMRF,  and  (b)  a  non-Markov  random  field. 


using  (30)  to  calculate  T,  and  D^.  we  note  that  the  dimension  of  is  roughly  5  x  10“'*  times  the 
dimension  of  tliis  approach  reduces  the  computational  cost  of  determining  {Ts,Ds)  by  roughly 
a  factor  of  6  x  10^.  From  this  example,  the  structure  of  our  approach  should  be  clear,  for  any  case 
in  which  we  are  modeling  a  WSMRF. 

For  non-Markov  random  fields,  there  is  no  guarantee  that  the  correlated  component  of  (x^,  Xs‘=) 
can  be  captured  by  boundary  information  over  a  region  as  thin  as  the  one  used  in  our  foregoing 
example.  To  compensate  for  this  fact,  we  modify  our  approach  slightly  for  the  non-Markov  case. 
Our  modified  strategy  is  to  make  the  boundary  region  as  thick  as  possible  for  each  of  Xs  and  Xs‘=-, 
subject  to  the  constraint  that  the  resulting  vectors  jig  and  jj.gc  have  dimension  no  greater  than  some 
prescribed  limit.  Using  the  same  graphical  conventions  as  in  Figure  2a,  this  idea  is  illustrated  in 
Figure  2b,  where  132  is  the  limiting  dimension  of  both  fig  and  /ijc.  Once  /Xj  and  /is<=  have  been 
defined,  tve  proceed  exactly  as  in  the  Markov  case. 


5  Numerical  Examples 


In  this  section,  we  present  two  nunierical  examples  that  suggest  the  promise  of  our  modeling 
approach.  In  all  cases,  the  models  we  I^uild  are  indexed  on  quadtrees.  Also,  for  the  purposes  of 
calculating  the  canonical  correlation  matrices,  we  set  Qrows  =  260. 

5.1  Reduced-order  Representations  of  Isotropic  Random  Fields 

For  our  first  example,  we  consider  a  scalar,  wide-sense  stationary,  zero-mean,  isotropic  (but  non- 
Markov)  random  field  y{m,n)  that  is  of  interest  in  the  geological  sciences  [17].  The  correlation 
function  for  this  field  can  be  expressed  analytically  as  follows: 

f  1  -  3/2(r/f)  +  l/2(r/0^  0  <  r  <  £, 

RyyiiJ)  =  E[y{m  +  i,n  +  j)uiin,n)]  =  i?yy(r)  =  <  (31) 

0  r  >  i, 

where  r  =  ^  chciracteristic  length  of  the  function.  A  plot  of  this  function  for 

f  =  80  is  represented  by  the  solid  curve  in  Figure  4;  we  see  from  this  plot  that  there  is  significant 
long-range  correlation,  at  least  relative  to  the  total  grid  size  we  will  be  using. 

We  build  multiscale  models  to  realize  the  correlation  function  (31)  on  a  128  x  128  grid.  We 
build  four  models,  each  involving  a  different  constraint  on  the  state  dimension;  we  constrain  the 
state  dimension  to  be  no  greater  than  the  respective  values  64,  32,  16  and  8. 

In  Figure  3a,  we  display  as  a  contour  plot  the  exact  correlation  function  (31).  Then,  in  Fig¬ 
ures  3b,  c  and  d,  we  display  as  contour  plots  the  correlation  function  associated  with  our  multiscale 
models  of  order  32,  16,  and  8,  respectively.  Because  our  multiscale  models  have  reduced  order, 
they  lead  to  correlation  structures  thiit  cire  only  approximately  stationary,  and  thus  we  must  define 
carefully  what  is  being  plotted  in  Figures  3b.  c  and  d.  Towards  this  end,  we  let  <^q  denote  the 
random  vector  comprising  the  finest-scale  of  the  particular  multiscale  process  in  which  the  state 
vectors  are  constrained  to  have  dimension  no  greater  than  A;  we  thus  have  if®, 
denote  the  (z,  j)-th  component  of  by  S,o(i.j)  (for  i.j  =  0, 1, . . .  ,  127).  In  terms  of  these  conven¬ 
tions,  the  plots  in  Figures  3b,  c.  and  d  display  contours  of  the  function  Rx{-.,  •))  for  A  =  32,  16  and 
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8  respectively,  where 


Rxim,  n) 


1 


(12S  -  ;;; j(  126  -  n  j 


ijr-Ki  rj7-;i 

E  E 

1=0  j=0 


SO 


i  +  m.j  +  n 


(32) 


We  do  not  include  a  contour  plot  tor  our  model  ot  order  64.  because  for  orders  greater  than  just 
16,  our  multiscale  models  capture  virtually  all  of  the  significant  correlation  structure.  This  fact 
is  reinforced  in  Figures  4a  and  b.  where  we  display  horizontal  and  vertical  slices  of  these  contour 
plots. 

Let  us  consider  the  use  of  these  multiscale  models  to  carry  out  linear  least-squares  estimation. 
In  Figure  5a,  we  display  the  original  signal  that  we  will  be  attempting  to  estimate.  This  signal 
consists  of  128  x  128  pixels  and  has  a  Gaussian  distribution.  It  is  drawn  from  the  exact  distribution 
implied  by  (31)  with  f  =  80.  This  field  generation  is  effected  by  embedding  the  128  x  128  grid 
into  a  larger  256  x  256  toroidal  lattice,  and  extending  the  definition  of  Ryy{-,-)  to  have  periodic 
boundary  conditions;  for  i  =  80,  this  approach  leads  to  a  valid  (i.e.,  positive  definite)  correlation 
function. 

We  consider  two  estimation  problems  related  to  the  signal  in  Figure  5a.  For  the  first,  we  corrupt 
the  signal  with  spatially  stationary  white  noise  having  covariance  one,  thus  leading  to  an  SNR  of 
OdB  (since  the  signal  also  has  a  variance  of  one,  as  indicated  by  (31)).  In  Figure  5b,  we  display  an 
estimate  based  on  our  multiscale  model  of  order  64.  The  sample  MSE  here  is  0.0498.  While  there 
is  no  computationally  feasible  w'a}’  to  determine  the  mean-square  error  of  an  optimal  estimator  for 
this  problem,  we  can  obtain  a  fairly  tight  lower  bound  for  the  optimal  MSE.  In  particular,  let  us 
consider  the  problem  of  estimating  the  value  of  the  256  x  256  signal,  from  which  our  128  x  128 
signal  has  been  extracted.  Since  this  256  x  256  signal  is  stationary  and  is  indexed  on  a  toroidal 
lattice,  exact  calculations  are  possible.  In  particular,  for  estimating  this  signal  in  OdB  white  noise, 
the  optimal,  FFT-based  estimator  has  an  MSE  of  0.0458,  which  must  lower-bound  the  MSE  of  an 
optimal  estimator  in  our  original  estimation  problem.  By  comparison,  then,  our  measured  MSE  of 
0.0498  is  quite  satisfactory.  Although  not  shown  in  the  Figure,  the  same  level  of  performance  is 
also  achieved  by  our  lower-order  nmltiscale  models;  specifically,  our  models  of  order  32,  16  and  8 
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(a) 


(b) 


Horizontal  offset 


Horizontal  offset 


(c) 


(d) 


Figure  3;  These  four  figures  display  contour  plots  associated  with  Ryy{-,  ),  defined  in  (31),  with  the  contour 
levels  at  0.9,  0.7,  0.5,  0.3  and  0.1.  (a)  The  exact,  desired  correlation  function,  (b),  (c),  and  (d)  The  correlation 
function  associated  with  multiscale  models  of  order  32.  16  and  8,  respectively.  These  three  have  been  determined  by 
Monte-Carlo  simulation,  using  enough  trials  so  that  every  estimated  correlation  value  is  within  0.005  of  its  correct 
value. 
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(a)  (b) 

Figure  4:  Comparison  of  (a)  vertical  and  (b)  horizontal  slices  of  the  correlation  contour  plots  in  the  previous  figure. 
Again,  these  plots  are  based  on  Monte-Carlo  simulation,  where  each  point  is  within  0.005  of  its  correct  value  with  95 
percent  confidence. 

achieve  sample  MSEs  of  0.0501,  0.0533  and  0.0544,  respectively,  which  are  all  close  to  the  optimal. 

The  second  estimation  problem  we  consider  is  one  for  which  the  FFT  is  of  little  practical  use. 
In  particular,  we  consider  the  problem  of  estimating  the  signal  displayed  in  Figure  5a,  based  on 
noiseless  measurements  at  the  extremely  sparse  set  of  points  displayed  in  Figure  5c.  These  points 
provide  only  1.11%  coverage  of  the  image  region.  Their  irregular  distribution  is  the  key  reason 
that  FFT  techniques  are  not  useful.  On  the  other  hand,  in  Figure  5d,  we  display  the  estimate  that 
results  from  use  of  our  multiscale  model  of  order  64.  In  light  of  the  sparsity  of  our  measurement 
coverage,  this  estimate  has  impressively  captured  the  coarse  qualitative  features  of  the  true  signal; 
in  fact,  the  sample  MSE  of  this  estimate  is  only  0.1147. 

5.2  Reduced-order  Representations  of  Warped- version  of  Isotropic  Correlation 
Function 

For  our  second  e.xample,  we  build  miiltiscale  representations  for  a  stationary  random  field  having  a 
correlation  function  that  is  a  warped  version  of  the  isotropic  correlation  function  Ryy{kJ)  in  (31. 
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(c)  (cl) 

Figure  5:  These  four  figures  relate  to  linear  least-squares  estimation  of  a  signal  having  the  isotropic  correlation 
function  in  (31).  (a)  The  original  signal,  with  Gaussian  deviates,  drawn  from  the  exact  distribution  using  FFT-based 
techniques,  (b)  Estimate  of  the  sample  path  in  (a),  based  on  noisy,  densely  distributed  measurements  of  the  signal, 
with  OdB  SNR;  a  64-th  order  multiscale  model  is  used  to  obtain  thi.s  estimate,  (c)  Locations  of  olrserved  pixels,  for  a 
second  estimation  experiment;  these  observed  pixels  provide  only  1.11  ‘If  coverage  of  the  image,  (d)  Estimate  of  the 
sample  path  in  (a),  based  on  noiseless  observations  of  the  observed  pixels  (displayed  in  (c)). 


26 


Our  warped  version,  which  we  denote  by  is  rletined  ;is  follows: 


~  -j  )■ 
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1  0 
0  4 


(  ... 


cos  9  sin  9 
sin  9  cos  9 


w  / 
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(33a) 


(33b) 


The  characteristic  length  d  of  (see  (31))  is  again  set  to  ^  =  80.  A  contour  plot  of  Ryy{iJ) 

is  displayed  in  Figure  6a,  while  slices  of  this  correlation  function  along  the  directions  of  strongest 
and  weakest  correlation  are  displayed  in  Figures  7a  and  b,  respectively. 

We  consider  the  problem  of  building  a  multiscale  model,  indexed  on  a  quadtree,  to  realize  the 
correlation  function  (33b)  on  a  128  x  128  grid.  We  constrain  the  multiscale  model  dimension  to  the 
respective  values  of  64,  32,  16  and  8. 

In  Figures  6b,  c  and  d,  we  display  as  contour  plots  the  correlation  function  associated  with 
our  multiscale  models  of  order  32.  16,  and  8,  respectively.  .Just  as  in  our  previous  example,  we 
must  carefully  define  the  precise  meaning  of  these  contours.  As  in  the  previous  example,  we  let 
denote  the  random  vector  comprising  the  finest-scale  of  the  particular  multiscale  process  in  which 
the  state  vectors  are  constrained  to  have  dimension  no  greater  than  A;  we  thus  have  ^q,  and 
We  denote  the  (f,  j’)-th  component  of  by  ^Q{i,j)  (for  i,j  =  0, 1, . . .  ,  127).  In  terms  of  these 
conventions,  the  plots  in  Figures  6b,  c,  and  d  display  contours  of  the  function  Rx{-,  ■),  for  A  =  32, 
16  and  8  respectively,  where  Rx{-.-)  is  defined  in  (32).  We  do  not  include  a  contour  plot  for  our 
model  of  order  64,  because  at  this  order,  the  contour  plot  is  indistinguishable  from  the  ideal,  desired 
correlation  in  (a).  To  allow  for  more  direct  comparison  of  these  contours,  we  overlay  slices  of  them 
in  Figures  7a  and  b;  more  specifically.  Figure  7a  represents  a  slice  of  the  contour  plots,  along  the 
direction  of  strongest  correlation,  while  part  b  represents  a  slice  of  the  contour  plots  along  the 
direction  of  weakest  correlation. 

In  Figure  8,  we  display  sample  paths  of  this  random  field  using  Gaussian  deviates,  generated 
with  our  models  of  order  64,  32.  16  and  8.  We  see  that  unless  a  relatively  high  order  model  is 
used,  the  sample  paths  exhibit  visually  distracting  blocky  artifacts  at  the  quadrantal  boundaries. 


(c) 


(d) 


Figure  6:  These  four  figures  display  contour  plots  associated  with  ■),  defined  in  (33b),  with  the  contour 

levels  at  0.95,  0.85,  0.75,  0.6,  0.45,  0.3  and  0.15.  (a)  The  exact,  desired  correlation  function,  (b),  (c),  and  (d)  The 
correlation  function  associated  with  multiscale  models  of  order  32,  16  and  8,  respectively.  These  three  have  been 
determined  by  Monte-Carlo  simulation,  using  enough  trials  so  that  every  estimated  correlation  value  is  within  0.005 


of  its  correct  value. 


(a)  (b) 

Figure  7:  Comparison  of  slices  of  correlation  contour  plots  in  the  previous  figure,  (a)  .A.  slice  along  the  direction  of 
the  major  axis  of  the  ellipses  in  part  (a)  of  the  previous  figure,  (b)  .A  slice  along  the  direction  of  the  minor  axis  of 
the  ellipses  in  part  (a)  of  the  previous  figure.  .Again,  these  plots  are  based  on  Monte-Carlo  simulation,  where  each 
point  is  within  0.005  of  its  correct  value  with  95  percent  confidence. 

While  in  many  applications,  these  artifacts  are  devoid  of  any  statistical  significance,  they  may  be 
important  in  other  contexts.  One  way  to  eliminate  these  artifacts  is  employ  a  relatively  high-order 
model  multiscale  model;  for  instance,  as  shown  in  Figure  Sa,  the  64-th  order  model  is  effective  in 
this  regard.  An  alternative,  arguably  more  elegant  approach  to  eliminating  these  artifacts  is  to  use 
so-called  overlapping  tree  models,  in  which  distinct  tree  nodes  correspond  to  overlapping  portions 
of  the  image  domain;  this  idea  is  described  in  detail  in  [9]. 

6  Conclusions  and  Suggestions  for  Future  Work 

This  paper  develops  elements  of  a  theory  for  multiscale  stochastic  realization,  focusing  on  the 
problem  of  building  multiscale  models  to  realize,  either  exactly  or  approximately,  pre-specified 
finest-scale  statistics.  A  key  challenge  has  been  to  generalize  the  time-series  notion  of  state  vectors 
serving  as  an  interface  between  the  past  and  the  futui'e  of  a  random  process.  The  generalization  is 
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Figure  8:  These  four  figures  (.lisplay  sauiiile  [jatlis  of  a  random  field  liaving  the  correlation  function  given  in  (33b, 
for  a  128  x  128  pixel  region.  The  sainjile  paths  in  (a,l.  (I>),  (c)  and  (,dl  correspond  to  multiscale  morlels  of  order  64, 
32,  16  and  8.  respectively,  using  (lanssi.an  devitites. 
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made  by  introducing  a  generalized  correlation  coefficient,  which  is  used  to  make  precise  the  notion 
of  multiscale  state  vectors  serving  to  decorrelate  multiple  subsets  of  a  multiscale  process.  Once  the 
reduced-order  multiscale  modeling  problem  has  been  formalized,  we  harness  canonical  correlation 
analysis  to  develop  a  sub-optimal  model-ljuilding  algorithm.  We  demonstrate  the  practicality  of  this 
algorithm  in  problems  of  random-field  estimation  aiul  generation.  In  the  context  of  field  generation, 
we  demonstrate  an  ability  to  build  multiscale  processes  having  a  finest-scale  correlation  matching 
very  closely  desired  correlations.  In  the  context  of  field  estimation,  we  build  multiscale  models  that 
are  in  turn  used  to  carry  out  least-squares  estimation,  with  the  resulting  field  estimates  having 
nearly  optimal  mean-square  error. 

The  work  presented  raises  a  number  of  interesting  research  questions.  What  is  the  optimal 
solution  to  the  general  decorrelation  problem  addressed  in  Section  4?  Is  the  bound  on  state  di¬ 
mension  in  Proposition  1  tight?  In  other  words,  can  a  multiscale  model  be  devised  in  which  the 
bound  holds  with  equality  at  every  node?  If  not,  what  then  constitutes  a  minimal  realization  of  a 
given  finest-scale  covariance?  Is  the  class  of  internal  realizations  rich  enough  to  always  include  a 
minimal  realization?  This  last  question  is  particularly  intriguing,  because  in  the  time-series  case, 
the  answer  is  yes  [10,  16]. 

Finally,  other  interesting  questions  arise  when  we  consider  more  carefully  the  issue  of  inter-scale 
propagation  of  information  in  multiscale  processes.  Consider,  for  example,  the  problem  of  building 
a  multiscale  model  indexed  on  a  second-order  tree  (i.e.,  a  tree  for  which  q  =  2)  having  three  scales 
to  realize  exactly  the  following  finest-scale  covariance: 


^  1  0  0  0  ^ 
0  110 
0  110 
^  0  0  0  1  y 


(34) 
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One  possible  exact  realization  uses  the  following  values  for  the  internal  matrices: 

(^0100 

(^0  1  )  •  =  (^10 

“  Ifoa-^ax  ~  ll'0a2Q2  “ 

A  valid  alternative,  which  also  leads  to  an  exact  realization,  is  to  replace  Wq  in  (35)  with  Wq, 

=  diag(l,  1,1,1), 

while  retaining  the  same  values  as  in  (35)  for  the  other  internal  matrices.  There  is  an  important 
difference  between  the  models  that  result  from  these  two  choices  for  the  internal  matrices.  The  first 
choice  leads  to  a  model  in  which  coarse-scale  information  is  preserved  in  its  journey  to  the  finest 
scale;  in  particular,  we  see  that  x{s)  =  and  so  indeed  the  first  realization  is  internal.  On  the 

other  hand,  the  second  choice  leads  to  a  model  in  which  information  is  lost  in  its  journey  to  the 
finest  scale.  In  fact,  by  using  Wq  in  lieu  of  Wq,  we  have  somewhat  perversely  created  a  multiscale 
model  in  which  the  entire  finest-scale  process  is  generated  at  the  root  node,  and  then  some  of  this 
information  is  immediately  discarded  in  the  transition  to  the  middle  scale,  whence  new  values  for 
this  discarded  information  are  generated  in  the  transition  to  the  third,  finest  scale.  Although  the 
finest  scale  process  does  have  the  correct,  desired  correlation,  the  realization  is  not  internal®.  In 
particular,  x(0)  ^  Wq^q. 

In  light  of  this  example,  how  can  our  modeling  approach  be  refined  to  propagate  information 
more  explicitly  from  scale  to  scale?  One  answer  is  presented  in  [9],  though  the  suggested  approach 
is  not  numerically  practical.  As  a  related  issue,  how  can  we  extend  our  modeling  approach  to  realize 
jointly  pre-specified  fine  and  coarse  scale  statistics? 

All  of  these  unanswered  questions  highlight  the  fact  that  multiscale  stochastic  realization  is  a 
relatively  new  subject,  with  this  paper  serving  only  as  a  beginning.  Many  interesting  challenges 
remain. 

^This  example  demonstrates  that  condition  (13)  is  necessary  but  not  sufficient  for  an  exact,  internal  realization. 


n-o  = 
II' Oa,  = 
II*0a  1 A I  ~ 


32 


A  Proof  of  Proposition  2 


In  this  appendix,  we  complete  the  proof  of  Proposition  2.  by  establishing  the  validity  of  (18a)  and 
(18b).  For  this  purpose,  we  continue  to  use  the  notation  established  in  Section  4.1. 

We  begin  by  making  e.xplicit  the  connection  between  the  value  of  p(/zi,/i2  !  WiPi)  and  the 
cross-covariance.  D  between  and  p2-  To  proceed,  we  define  the  linear  least-squares  residual  /2 
via 

A  —  =/i  -  £'(/i  I  W/i),  (36) 

where  by  elementary  theory  of  linear  least-squares  estimation  theory, 

Pp  =  [WP^W^yHvP^  (37) 

For  this  analysis,  we  set  W  =  {Wi  0),  so  that  Wp  =  Wnn.  In  terms  of  (37),  we  then  define  the 
sets  F\  and  F2  as  follows: 

F.  =  {/;/^Pa./  =  1}  (i  =  l,2).  (38) 

Now,  using  the  definitions  in  Section  2.3,  we  find  that  if  either  F\  or  F2  is  empty,  then  piiii,  ij-2\W\ix\)  = 
0,  and  otherwise 

p(pi,M2  I  IFipi)  =  max  /f P^j/i2/2.  (39) 

/i€Fi,/26F2 

In  general,  the  sub-matrices  P^j,  P^.^  and  P/ii/ij  of  Pjj.  in  (37)  can  have  messy  analytical  forms. 
However,  if  Wi  has  orthonormal  rows  (i.e.,  Wi  €  Mx,  for  some  A),  then  simplification  is  possible. 
We  begin  by  noting  that 

p(Mi-M2  I  ITi/ii)  =  p(Mi./i2  I  fFi'pi)  (40) 

where  y  and  W[  are  related  respectively  to  pi  and  Wi  via  unitary  (but  otherwise  arbitrary)  matrix 
U: 

(i[  =  =  WiP,  (41) 
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Letting  be  a  matrix  whose  rows  form  an  orthonormal  basis  for  the  nullspace  of  Wi.  we  set 
U  —  (^^1^)^),  so  that  fFj'  =  {Ix  0).  Then,  thanks  to  (37), 


/  p-  p.  \ 


Pi 


(  ( 
\ 


J 


\ 


0  0 
0 

0 

V  WtD  j 

Finally,  adapting  (40)-(42)  to  (36)-(38),  the  following  lemma  results: 


7rii-A  j 

(  ^ 


Im^-D'^W^WiD 


(42) 


) 


Lemma  2  Let  Wi  6  M\,  0  <  A  <  mi,  and  let  IFp  be  a  matrix  whose  rows  form  an  orthonormal 
basis  for  the  nullspace  of  Wi .  Then, 

pil^i,l^2\Wipi)  =  max  {9{W^Dg2]  (43a) 

g2€G2  ^  ) 

=  m^  il  W^Dg2  II2,  (43b) 

92^G2 

where  G\  and  G2  denote  the  following  sets: 


Gi  =  /3  =  l}, 

G2  =  {ge  ;  /(4„,  -  D^Wl W^D)g  =  1 }  . 


As  a  direct  consequence  of  this  lemma. 


p{pi,P2  1  {h  0)m) 


dx+i,  0  <  A  <  mi2 

0  A  >  mi2 


(44) 


This  fact  establishes  (18b). 

What  remains  is  to  establish  (18a).  To  proceed,  we  temporarily  constrain  W  to  have  either  of 
the  two  forms 


W 


Wi  0 


or 


(45) 


and  we  find  a  matrix  W  €  jVx  that  minimizes  p{pi.p2  \  kF/x),  subject  to  this  additional  constraint. 
The  following  lemma  summarizes  the  key  result  here. 
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Lemma  3 


I  i  (^  /,^  0  j  fn)  =  p(fll.^2  \  0  j  P2) 

=  nnn  pipi-pi  \  \V2P2)  =-  c^A+i- 

Iv  2^- V  .\ 

Proof:  Thanks  to  (44),  it  is  sufficient  to  show  r.hat  for  all  W\  6  yV\, 

p{p\,H2\Wiiii)  >  f/A+1.  (46) 

To  establish  (46),  we  fix  Wi  £  .Va.  Referring  back  to  Lemma  2,  and  in  particular  (43a),  we  devise 
particular  values  for  gi  £  Gi  and  go  G  G2  for  which 

g[(W/-D)g2  >  d,\+u  (47) 

thus  implying  that  p(pi,p2  I  Wig.i)  >  d\+i. 

To  establish  (47),  we  first  note  that  at  least  one  of  the  unit  vectors  ef ,  e^, . . .  ,  must  belong 
to  the  row  space  of  W^,  which  itself  has  a  dimension  of  mi  -  A;  let  us  suppose  that  ej  belongs, 
with  h^W-^  =  eJ  for  some  h  £  77."^*  Now,  exploiting  the  orthonormality  of  the  rows  of  Wj^,  we 
see  that  h  £  Gi,  and  hence  we  let  <71  =  h.  Also,  we  let  52  =  Cj,  where  the  fact  that  Dej  =  djCj  = 
dj(W^)'^gi,  implies  that  WiDej  =  0,  so  that  indeed  ej  £  (j2-  But  for  these  values  for  gi  and  g2, 
gf  (W^D)g2  =  dj  >  d\+i,  thus  establishing  (47)  and  completing  the  proof.  QED. 

Next,  we  establish  that  in  fact  there  is  no  loss  of  optimality  in  the  additional  constraint  in  (45). 
The  following  lemma  summarizes  the  key  result. 

Lemma  4  For  any  matrix  W  £  there  exists  a  pair  of  matrices  Wf  £  and  W2  £ 

■with  Ai  +  A2  <  A,  such  that  plpi.iao  I  ^L’l/zi,  R2/i2)  <  p{pi,p2  |  Wg). 


Proof:  We  begin  by  expressing  the  matrix  W  in  terms  of  its  constituent  rows  as 

T 

ITi  Wo  ■■■  Wa 


W 


where  the  column  vector  Wi,  for  i  =  1,  2, . . .  ,  A,  can  itself  be  decomposed  as 

(  W, 


W  = 


d 

\  Wu 
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Let  us  suppose  that  for  some  particular  i.  say  /'i,  11',^  j  yi  0.  and  ^  0.  We  demonstrate  that 

can  be  replaced  with  one  of  the  two  vectors  I'l  or  I  -),  where 


(  ”  'm  .1 

(  °  ^ 

and  = 

1  0 

) 

V  ^^.2 ; 

with  no  incurred  increase  in  the  value  of  piixi.ji-,  |  W p).  after  the  replacement. 

We  now  define  p,  II2  and  as  in  (36)  and  (37).  From  (37),  it  immediately  follows  that 

=  0,  (48) 


which  in  turn  implies  that  P^iWi  =  0,  so  that 

rT  D_  T,T'  _  t.t’7’  d  t.i/  _  ixrT 


There  are  now  two  possibilities;  either  ^  P^.  >  0  ,  or  =  0-  The  first  implies 

that  p{p,i,fj,2  \Wjj,)  =  1,  in  which  case  there  can  certainly  be  no  harm  in  our  replacement  strategic 
The  second  implies  that  Vi  =  0,  and  P^j^s  =  0i  which,  in  turn,  means  that  there  exist 
unique  vectors  hi  and  /12  iii  such  that  Vi  =  {i  =  1,2).  Now,  by  exhaustively  considering 

the  possibilities,  one  can  verify  that  at  least  one  of  hi  and  /i2  must  have  a  non-zero  value  in  its 
fj-th  component,  for  otherwise,  W  could  not  have  full  row  rank.  If  hi  (/12)  has  this  property,  then 
we  can  replace  Wi^  with  Fj  (F2)  with  no  change  in  the  value  of  p{p,i,(i2  |  W p).  QED. 

To  make  clear  the  conseciuences  of  this  lemma,  let  us  suppose  that  W*  6  minimizes 
p{fii,P2  I  Then,  from  the  lemma  we  know  there  exist  matrices  W*  €  and  W2  G  Mx^ 

(for  some  Aj  and  A2  such  that  Ai  4- Ao  <  A)  such  that  p(fii,/i2  \  W*fj,)  =  p{pi,ij,2  |  W*fii,W2fi2)- 
Then,  fixing  W*,  let  us  define  pi  =  ?/,-  —  P(/i,;  |  lF*^i),  {i  =  1,2),  which  we  use  to  see  that 

p{pi,P2\W*n)  =  min  p(;ii- li2  |  WC/ti,  W2M2)  =  „  min  p(/ii,M2  |  W2/22) 

IVoc.Wx.^  W2€Mx2 

=  .min  p(/ii./i2  I  WiMi)  =  .min  p(/ii,/i2  I  Wj/ri) 

=  p(pi-P2  I  IFi/ii). 

VV 1 

The  first  equality  follows  directly  from  Lemma  3.  The  second  follows  from  the  definition  of  pi,  while 
the  third  line  follows  from  Lemma  2.  The  fourth  equality  follows  again  from  the  relation  between 
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jli  and  fii-  and  finally,  the  fifth  follou-.s  from  the  fact  that  both  of  the  vectors  of  conditioning 
information  in  the  fourth  line  of  functions  only  of  /ij.  The  proof  of  Proposition  2  is  now  complete. 
QED. 


B  Proof  of  Proposition  3 

In  this  appendix,  we  complete  the  proof  of  Proposition  3,  by  establishing  the  validity  of  (25).  For 
this  purpose,  we  continue  to  use  the  notation  established  in  Section  4.1. 

We  begin  by  fixing  Wi,  which  we  assume  without  loss  of  generality  to  have  orthonormal  rows. 
We  know  from  (44)  that  p(/ii,^2)  =  d\.  Combining  this  fact  with  (43b),  it  follows  that  the 

Proposition  will  be  proved  if  we  can  show  that 

max  II  W^Dg2  II2  <  (49) 

52€G’> 

To  establish  (49),  we  first  note  that  since  the  rows  of  form  an  orthonormal  basis  for  the 
null  space  of  Wi,  we  have  that  Vx, 

II  X  111  =  II  Wix  Hi  +  II  Wi-x  Hi  .  (50) 


Since  V52  €  C?2, 


g^^{I  -  D'^WlWxD)g 


92  Hi  -  II  W,Dg2  Hi  =  1, 


(51) 


we  can  apply  (50)  in  (51)  with  x  =  Dgo  to  see  that  Vg2  €  G2, 


W,^Dg2  Hi 


D92  II2  “  II  92  111  +1’  ^92  €  (J2. 


(52) 


But 


min  { II  52  III 

<72€Cj2 


„  ,2,  .  f  sUl-D'^D)g2 

=  SS,U{r-Dnvrw,D)g2 


'72#0 


T 

92  92 


92  92 


g^^o\gl{I  -  D^W^WiD)g2 


] 


=  eig„„„  (/  -  D^D)  eig„,,„  [(/  -  D'^W^W^D) 

=  (1-^?)(1) 


-1 


(53) 
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where  eig„,j„(-)  denotes  the  smallest  eigenvalue  of  the  enclosed  matrix  expression.  In  the  third 
line,  we  have  used  Rayleigh's  principle  [18].  which  asserts  that  for  any  pair  of  symmetric,  positive 
definite  matrices  .4  and  B. 

Bx 

mm-y— -  =  eig„„„(.4  B). 

T5=0  .4x 

By  combining  (52)  and  (53),  the  desired  result  (49)  is  established.  QED. 
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